Optical flow estimation using 4-dimensional cost volume processing

ABSTRACT

Techniques are provided for estimation of optical flow between images using 4-dimensional cost volume processing. A methodology implementing the techniques according to an embodiment includes extracting a first set of feature vectors from a first image and extracting a second set of feature vectors from a second image. Each feature vector of the first set is associated with a pixel of the first image and each feature vector of the second set is associated with a pixel of the second image. The method further includes constructing a 4-dimensional (4D) cost volume to store a distance metric between each feature vector of the first set of feature vectors and a selected subset of feature vectors of the second set of feature vectors. The method further includes performing a flow-semi-global matching (Flow-SGM) on the 4D cost volume to estimate an optical flow vector for pixels of the first image.

BACKGROUND

Optical flow estimation generally provides a mapping between the pixelsof two or more images, for example to identify motion of objects insequential image frames received from a video camera. This can be usefulin computer vision and robotics systems. Existing optical flowestimation systems typically employ nearest neighbor searching andcoarse-to-fine analysis techniques. These systems, however, can becomputationally intensive and can sometimes produce unacceptableresults, particularly when there are large displacements (e.g., motion)between image frames, texture-less regions in the images, and/or motionblur.

BRIEF DESCRIPTION OF THE DRAWINGS

Features and advantages of embodiments of the claimed subject matterwill become apparent as the following Detailed Description proceeds, andupon reference to the Drawings, wherein like numerals depict like parts.

FIG. 1 is a top-level block diagram of an implementation of an opticalflow estimation system, configured in accordance with certainembodiments of the present disclosure.

FIG. 2 illustrates an estimated optical flow field, in accordance withcertain embodiments of the present disclosure.

FIG. 3 is a more detailed block diagram of an optical flow estimationsystem, configured in accordance with certain embodiments of the presentdisclosure.

FIG. 4 illustrates a 4D cost volume, in accordance with certainembodiments of the present disclosure.

FIG. 5 is a block diagram of a training system for the convolutionalneural network, configured in accordance with certain embodiments of thepresent disclosure.

FIG. 6 is a flowchart illustrating a methodology for optical flowestimation, in accordance with certain embodiments of the presentdisclosure.

FIG. 7 is a block diagram schematically illustrating a platform toperform optical flow estimation, configured in accordance with certainembodiments of the present disclosure.

Although the following Detailed Description will proceed with referencebeing made to illustrative embodiments, many alternatives,modifications, and variations thereof will be apparent in light of thisdisclosure.

DETAILED DESCRIPTION

Generally, this disclosure provides techniques for optical flowestimation that provides a mapping between the pixels of two or moreimages. The generated mapping is in the form of a vector field, whereeach vector describes the motion of an associated pixel over the elapsedtime between sequential image frames. The techniques allow for improvedoptical flow estimation accuracy by constructing and employing a fullfour-dimensional (4D) cost volume, as will be explained below. Thetechniques also allow for exploitation of parallel processingcapabilities and other calculation efficiencies, to provide improvedcomputational performance, as will also be described in greater detailbelow. The resulting estimated optical flow field may be used for avariety of applications including video segmentation, motion detection,object tracking, action recognition, and autonomous driving.

The disclosed techniques can be implemented, for example, in a computingsystem or a software product executable or otherwise controllable bysuch systems, although other embodiments will be apparent. The system orproduct is configured to provide estimation of optical flow between twoor more images using 4-dimensional cost volume processing. In accordancewith an embodiment, a methodology to implement these techniques includesextracting a first set of feature vectors from a first image andextracting a second set of feature vectors from a second image. Eachfeature vector of the first set is associated with a pixel of the firstimage and each feature vector of the second set is associated with apixel of the second image. The method also includes constructing a4-dimensional (4D) cost volume to store a distance metric between eachfeature vector of the first set of feature vectors and a selected subsetof feature vectors of the second set of feature vectors. The methodfurther includes performing a flow-semi-global matching (Flow-SGM) onthe 4D cost volume to estimate an optical flow vector for pixels of thefirst image and to generate an estimated optical flow field from theestimated optical flow vectors.

As will be appreciated, the techniques described herein may allow forimproved optical flow estimation, compared to existing methods that relyon nearest neighbor searching and coarse-to-fine analysis techniques.The disclosed techniques can be implemented on a broad range ofplatforms including laptops, tablets, smart phones, workstations, andimaging devices. These techniques may further be implemented in hardwareor software or a combination thereof.

FIG. 1 is a top-level block diagram of an image processing apparatus100, configured to implement of an optical flow estimation system inaccordance with certain embodiments of the present disclosure. Anoptical flow estimation system 110 is shown as being configured toreceive two image frames, 102 and 104, and to generate an estimatedoptical flow field 120. The image frames may be provided by any suitableimaging source including, for example, a camera, a video camera, ascanner, or a database of images. The image frames 102, 104 willtypically capture a scene at sequential moments in time where someportions of the scene (e.g., people, objects, and/or background) are inmotion over the time period between frames. The estimated optical flowfield 120 comprises a set of vectors that describe the motion of pixelsbetween the first and second image frame. The optical flow estimationsystem 110 includes a convolutional neural network (CNN) configured toperform feature extraction on the images, as will be explained ingreater detail below, and the training system 130 is configured to trainthe CNN, as will also be explained below.

Although only two image frames are shown, it will be appreciated thatthe optical flow estimation system 110 may operate on any number ofpairs of image frames to produce any desired number of estimated opticalflow fields 120. The resulting fields 120 may then be provided to one ormore image processing applications 140 such as, for example, a videosegmentation application, a motion detection application, an objecttracking application, an action recognition application, an autonomousdriving system, and a computer vision application.

FIG. 2 illustrates an estimated optical flow field 120, in accordancewith certain embodiments of the present disclosure. Image frame 1 102and image frame 2 104 are shown to include pixels 202 labeled P1, P2, .. . PN. Only three pixels are shown, for simplicity of illustration, butit will be understood that the image frames generally comprise largenumbers of pixels, for example on the order of millions of pixels ormore. Image frame 2 104 captures a representation of a scene at a pointin time subsequent to that of image frame 1 102. As such, some portionsof the image may move and this is shown by arrows 204 representing pixelmotion from frame to frame. The estimated optical flow field 120 is alsoshown with flow vectors V1, V2, . . . VN 206 attached to each of thepixels P1, P2, . . . PN 202.

FIG. 3 is a more detailed block diagram of an optical flow estimationsystem 110, configured in accordance with certain embodiments of thepresent disclosure. The system 110 is shown to include an imagedown-sampling circuit 202, a feature extraction circuit 204 including aconvolutional neural network (CNN), a cost volume construction circuit206, a cost volume processing circuit 208, an up-sampling circuit 210,and a post processing circuit 212. A training system 130 for the CNN ofthe feature extraction circuit 204 is also shown, the operation of whichis described in greater detail below in connection with FIG. 4.

The image down-sampling circuit 202 is configured to down-sample a firstreceived image frame 102 and a second received image frame 104, from anoriginal resolution (X×Y) to a selected lower resolution (M×N). In someembodiments, the selected lower resolution may be on the order of onethird of the original resolution (e.g., M=X/3, N=Y/3). The degree ofdown-sampling may be adjusted based on computational requirements andaccuracy requirements.

The feature extraction circuit 204 is configured to extract a first setof feature vectors F₁ 224 from the first down-sampled image I₁ 220, andto extract a second set of feature vectors F₂ 226 from the seconddown-sampled image I₂ 222. Each feature vector of the first set F₁ isassociated with a pixel of the first down-sampled image I₁ and eachfeature vector of the second set F₂ is associated with a pixel of thesecond down-sampled image I₂.

In some embodiments, the feature extraction is performed by aconvolutional neural network (CNN), the training of which is describedbelow in connection with FIG. 5. In some embodiments, the CNN may beconfigured with four convolutional layers, wherein the first threelayers employ 64 filters of size 3×3 and the last layer employs dfilters, where d is the dimensionality of the feature vectors which ischosen as a trade-off between expressive power and computational cost.In some embodiments, the chosen dimensionality d may be 64, 32, 16, or10, which values have been found to produce positive results inexperimental testing. The use of a relatively small dimensionality d maycontribute to the computational efficiency of the disclosed techniques,compared to traditional image processing techniques that typically use1024 or more features. Additionally, the CNN may be configured with arelatively small receptive field of, for example, on the order of 9×9pixels, which corresponds to an induced receptive field in the originalimage of 27×27 pixels (for a down-sampling factor of 3). The use of asmall receptive field may further contribute to the computationalefficiency of the disclosed techniques.

The cost volume construction circuit 206 is configured to construct a 4Dcost volume 228 to store a distance metric between each feature vectorof the first set (M×N) of feature vectors F₁ and a selected subset (R×R)of feature vectors of the second set of feature vectors F₂. The size ofthe selected subset, R, is chosen to represent the maximum displacementof pixels (from frame to frame) that needs to be considered. The 4D costvolume 228 is illustrated in FIG. 4.

The cost volume construction circuit 206 is further configured tonormalize the extracted feature vectors, of the cost volume 228, tounity length, and calculate the distance metric as a Euclidean distanceusing a vector dot product operation applied to the normalized extractedfeature vectors. In some embodiments, the cost volume constructioncircuit is further configured to rescale and bin the distance metrics ofthe 4D cost volume to a selected integer range, such as, for exampleeight bits. The use of a vector dot product, and the bit rangerestriction (e.g., allowing for fixed point arithmetic), improves thecomputational efficiency of the disclosed techniques. Additionally,parallel processing techniques (e.g., using parallel processors and/orparallel threads) may be exploited to further improve computationalefficiency, particularly since the regularity of the data in the featurevector spaces and the cost volume lends itself to parallel processing.

The cost volume processing circuit 208 is configured to perform aflow-semi-global matching (Flow-SGM) on the 4D cost volume to estimatean optical flow vector for pixels of the first image and to generate alow-resolution estimated optical flow field 230 comprising the estimatedoptical flow vectors. The Flow-SGM algorithm operates as follows.

A set of spatial neighbors of pixel p is denoted as N(p). The flowvector for pixel p is denoted as V_(p). The discrete energy of theoptical flow field V of all pixels is defined as:

${E(V)} = {\sum\limits_{p}\;\left( {{\sum\limits_{q \in {N{(p)}}}\;{P_{1}\left\lbrack {{{V_{p} - V_{q}}}_{1} = 1} \right\rbrack}} + {\sum\limits_{q \in {N{(p)}}}\;{P_{2}^{p,q}\left\lbrack {{{V_{p} - V_{q}}}_{1} > 1} \right\rbrack}} + {C\left( {p,V_{p}} \right)}} \right)}$where C is the cost volume entry for pixel p and flow vector V_(p), [·]denotes the Iverson bracket, and P₁ and P₂ ^(p,q) are regularizationparameters. P₁ is set to a chosen fixed constant value and P₂ ^(p,q) isset as:

$P_{2}^{p,q} = \left\{ \begin{matrix}{\frac{P_{2}}{Q},} & {{{{if}\mspace{14mu}{{I_{p}^{1} - I_{q}^{1}}}} \geq T},} \\{{else},} & P_{2}\end{matrix} \right.$where the threshold T and the constants P₂ and Q are used to supportedge-aware smoothing of the cost volume. Flow-SGM minimizes the energyE(V) by breaking the energy into independent paths, which can beglobally minimized using dynamic programming. For each path a cost L_(r)is computed as:

${L_{r}\left( {p,V_{p}} \right)} = {{C\left( {p,V_{p}} \right)} + {S\left( {p,V_{p}} \right)} - {\min\limits_{i}\left( {{L_{r}\left( {{p - r},i} \right)} + P_{2}^{p,{p - r}}} \right)}}$where the contribution of the smoothness penalty S(p,V_(p)) isrecursively computed as:

${S\left( {p,V_{p}} \right)} = {\min\left\{ {{L_{r}\left( {{p - r},V_{p}} \right)},{\min\limits_{\hat{v} \in {N{(V_{p})}}}\left( {{{L_{r}\left( {{p - r},\hat{v}} \right)} + P_{1}},{\min_{i}\left( {{L_{r}\left( {{p - r},i} \right)} + P_{2}^{p,{p - r}}} \right)}} \right\}}} \right.}$where r denotes the direction of traversal of the path. The computationof the penalty for switching by one discretization step is computed overa two-dimensional neighborhood. Multiple path directions r may be usedand the corresponding costs L_(r)(p, V_(p)) are accumulated into afiltered cost volume L(p, V_(p)). In some embodiments, four cardinalpath directions (two horizontal and two vertical) may be used. The finaloptical flow estimate is obtained by choosing the flow corresponding tothe smallest cost in the filtered cost volume for each pixel. The flowmay be computed in both directions and used as a consistency check toprune occluded or otherwise unreliable matches.

The up-sampling circuit 210 is configured to up-sample thelow-resolution estimated optical flow field 210 back to the originalresolution (or any desired higher resolution) using interpolation. Insome embodiments, interpolation may be performed using Edge-preservingInterpolation of Correspondences for Optical Flow (EpicFlow).

The post processing circuit 212 is configured to in-fill occludedregions of the up-sampled high-resolution estimated optical flow field232. The in-filling is based on extrapolation performed withinhomography fitted segments of the high-resolution estimated optical flowfield 232. This technique makes use of the fact that large segments ofoptical flow fields can generally be characterized by planarhomographies.

In some embodiments, the extent of the planar regions is identifiedbased on a segmentation hierarchy combined with a bottom-up fittingstrategy. An Ultrametric Contour Map (UCM) is computed using a suitablefast boundary detector. Thresholding of the map at different levelsexploits a property of the UCM that induces a segmentation hierarchy. Atwo-level hierarchy may be created by thresholding the UCM at levels t₁and t₂, where t₂ is greater than t₁. Homographies may then be fitted tothe semi-dense matches belonging to segments in the finer level of thehierarchy. The fitting of segments may be performed using Random SampleConsensus (RANSAC). The homography is considered to be a validexplanation for the flow in the segment if its inlier set issufficiently large. Larger segments may then be further aggregated byconsidering segments at the coarse level to be candidates for homographyinpainting if the number of inliers in their children are sufficientlylarge. For each such higher-level segment, a homography is fitted andconsidered valid if enough inliers are found. For each segment with avalid homography, the homography is used to extrapolate the optical flowwithin the segment. All other segments are in painted, for example usingEpicFlow. This technique has the advantage of not relying on semanticinformation, but rather using low-level edge cues, which generallybroadens the applicability of the technique and enhances the synthesizedflow field in the presence of large occluded regions.

FIG. 4 illustrates a 4D cost volume 402, in accordance with certainembodiments of the present disclosure. The 4D cost volume 402 is shownto be of dimension M×N×R×R, as described previously, to store thedistance metrics between each of the feature vectors in F₁ and theselected subset of feature vectors in F₂.

FIG. 5 is a block diagram of a training system 130 for the convolutionalneural network, configured in accordance with certain embodiments of thepresent disclosure. The training system 130 is shown to include atriplet patch generation circuit 504 and a stochastic gradient descent(SGD) circuit 508.

The triplet patch generation circuit 504 is configured to receivetraining data 502 which includes pairs of images and ground truth datarepresenting known optical flow between the image pairs. For each imagepair, the triplet patch generation circuit 504 randomly samples ananchor patch x^(a) from the first image and uses the ground truthoptical flow to generate a corresponding positive patch x^(p) in thesecond image. A negative patch x^(n) is also generated by randomlysampling a patch in the second image at a distance of between one andfive pixels from the center of the positive patch x^(p). Thus, theanchor patch is known to be more similar to the positive patch than tothe negative patch. A training triplet is formed as {x^(a), x^(p),x^(n)} and the process is repeated to generate large numbers (e.g.,millions to hundreds of millions) of training triplets 506.

The stochastic gradient descent (SGD) circuit 508 is configured togenerate a trained CNN for feature extraction 510 using a stochasticgradient descent algorithm applied to the generated training triplets.In some embodiments, other known training techniques may be used inlight of the present disclosure. Additional computational efficiency maybe achieved by performing the triplet patch generation in parallel withthe SGD calculations, once the SGD pipeline has been primed withtraining triplets.

Methodology

FIG. 6 is a flowchart illustrating a methodology 600 for optical flowestimation, in accordance with certain embodiments of the presentdisclosure. As can be seen, the example methods include a number ofphases and sub-processes, the sequence of which may vary from oneembodiment to another. However, when considered in the aggregate, thesephases and sub-processes form a process for optical flow estimation inaccordance with certain of the embodiments disclosed herein. Theseembodiments can be implemented, for example using the systemarchitecture illustrated in FIGS. 1, 3, and 4 as described above.However other system architectures can be used in other embodiments, aswill be apparent in light of this disclosure. To this end, thecorrelation of the various functions shown in FIG. 6 to the specificcomponents illustrated in the other figures are not intended to implyany structural and/or use limitations. Rather, other embodiments mayinclude, for example, varying degrees of integration wherein multiplefunctionalities are effectively performed by one system. For example, inan alternative embodiment a single module can be used to perform all ofthe functions of method 600. Thus, other embodiments may have fewer ormore modules and/or sub-modules depending on the granularity ofimplementation. In still other embodiments, the methodology depicted canbe implemented as a computer program product including one or morenon-transitory machine readable mediums that when executed by one ormore processors cause the methodology to be carried out. Numerousvariations and alternative configurations will be apparent in light ofthis disclosure.

As illustrated in FIG. 6, in an embodiment, method 600 for optical flowestimation commences by extracting, at operation 610, a first set offeature vectors from a first image. Each feature vector of the first setis associated with a pixel of the first image. Next, at operation 620, asecond set of feature vectors is extracted from a second image. Eachfeature vector of the second set associated with a pixel of the secondimage.

At operation 630, a 4-dimensional (4D) cost volume is constructed tostore a distance metric between each feature vector of the first set offeature vectors and a selected subset of feature vectors of the secondset of feature vectors. In some embodiments, the extracted featurevectors are normalized to unity length, and the distance metric iscalculated as a Euclidean distance using a vector dot product operationapplied to the normalized extracted feature vectors. In some furtherembodiments, the distance metrics are rescaled and been to a selectedinteger range, such as, for example, eight bits.

At operation 640, a flow-semi-global matching (Flow-SGM) is performed onthe 4D cost volume to estimate an optical flow vector for pixels of thefirst image. An estimated optical flow field may then be constructed byassociating each optical flow vector with the corresponding image pixel.

Of course, in some embodiments, additional operations may be performed,as previously described in connection with the system. For example,occluded regions of the estimated optical flow field may be filled inusing extrapolation techniques as described above. Additionally, thefeature vector extraction may be performed by a trained convolutionalneural network (CNN), the training based on training data that includespairs of training images and associated ground truth optical flowvectors. The training may further employ a stochastic gradient descentoperation performed on the training data.

In some further embodiments, the estimated optical flow vectors may beprovided to a video segmentation application, a motion detectionapplication, an object tracking application, an action recognitionapplication, an autonomous driving system, or a computer visionapplication.

Example System

FIG. 7 illustrates an example system 700 to perform optical flowestimation, configured in accordance with certain embodiments of thepresent disclosure. In some embodiments, system 700 comprises an opticalflow estimation platform 710 which may host, or otherwise beincorporated into a personal computer, workstation, server system,laptop computer, ultra-laptop computer, tablet, touchpad, portablecomputer, handheld computer, palmtop computer, personal digitalassistant (PDA), cellular telephone, combination cellular telephone andPDA, smart device (for example, smartphone or smart tablet), mobileinternet device (MID), messaging device, data communication device,imaging device, and so forth. Any combination of different devices maybe used in certain embodiments.

In some embodiments, platform 710 may comprise any combination of aprocessor 720, a memory 730, optical flow estimation system 110, anetwork interface 740, an input/output (I/O) system 750, a userinterface 760, an imaging source 762, and a storage system 770. As canbe further seen, a bus and/or interconnect 792 is also provided to allowfor communication between the various components listed above and/orother components not shown. Platform 710 can be coupled to a network 794through network interface 740 to allow for communications with othercomputing devices, platforms, or resources. Other componentry andfunctionality not reflected in the block diagram of FIG. 7 will beapparent in light of this disclosure, and it will be appreciated thatother embodiments are not limited to any particular hardwareconfiguration.

Processor 720 can be any suitable processor, and may include one or morecoprocessors or controllers, such as an audio processor or a graphicsprocessing unit to assist in control and processing operationsassociated with system 700. In some embodiments, the processor 720 maybe implemented as any number of processor cores. The processor (orprocessor cores) may be any type of processor, such as, for example, amicro-processor, an embedded processor, a digital signal processor(DSP), a graphics processor (GPU), a network processor, a fieldprogrammable gate array or other device configured to execute code. Theprocessors may be multithreaded cores in that they may include more thanone hardware thread context (or “logical processor”) per core. Theprocessor(s) may be configured to provide parallel processing capabilitysuch that, for example, different portions of the disclosed algorithmscan be executed simultaneously, and/or multiple segments of data can beprocessed simultaneously. Processor 720 may be implemented as a complexinstruction set computer (CISC) or a reduced instruction set computer(RISC) processor. In some embodiments, processor 720 may be configuredas an x86 instruction set compatible processor.

Memory 730 can be implemented using any suitable type of digital storageincluding, for example, flash memory and/or random access memory (RAM).In some embodiments, the memory 730 may include various layers of memoryhierarchy and/or memory caches as are known to those of skill in theart. Memory 730 may be implemented as a volatile memory device such as,but not limited to, a RAM, dynamic RAM (DRAM), or static RAM (SRAM)device. Storage system 770 may be implemented as a non-volatile storagedevice such as, but not limited to, one or more of a hard disk drive(HDD), a solid-state drive (SSD), a universal serial bus (USB) drive, anoptical disk drive, tape drive, an internal storage device, an attachedstorage device, flash memory, battery backed-up synchronous DRAM(SDRAM), and/or a network accessible storage device. In someembodiments, storage 770 may comprise technology to increase the storageperformance enhanced protection for valuable digital media when multiplehard drives are included.

Processor 720 may be configured to execute an Operating System (OS) 780which may comprise any suitable operating system, such as Google Android(Google Inc., Mountain View, Calif.), Microsoft Windows (MicrosoftCorp., Redmond, Wash.), Apple OS X (Apple Inc., Cupertino, Calif.),Linux, or a real-time operating system (RTOS). As will be appreciated inlight of this disclosure, the techniques provided herein can beimplemented without regard to the particular operating system providedin conjunction with system 700, and therefore may also be implementedusing any suitable existing or subsequently-developed platform.

Network interface circuit 740 can be any appropriate network chip orchipset which allows for wired and/or wireless connection between othercomponents of computer system 700 and/or network 794, thereby enablingsystem 700 to communicate with other local and/or remote computingsystems, servers, cloud-based servers, and/or other resources. Wiredcommunication may conform to existing (or yet to be developed)standards, such as, for example, Ethernet. Wireless communication mayconform to existing (or yet to be developed) standards, such as, forexample, cellular communications including LTE (Long Term Evolution),Wireless Fidelity (Wi-Fi), Bluetooth, and/or Near Field Communication(NFC). Exemplary wireless networks include, but are not limited to,wireless local area networks, wireless personal area networks, wirelessmetropolitan area networks, cellular networks, and satellite networks.

I/O system 750 may be configured to interface between various I/Odevices and other components of computer system 700. I/O devices mayinclude, but not be limited to, user interface 760, and an imagingsource 762. User interface 760 may include devices (not shown) such as adisplay element, touchpad, keyboard, mouse, microphone, and speaker,etc. Imaging source 762 may be a camera, a video camera, a scanner, adatabase of images, or any other suitable source. I/O system 750 mayinclude a graphics subsystem configured to perform processing of imagesfor rendering on a display element. Graphics subsystem may be a graphicsprocessing unit or a visual processing unit (VPU), for example. Ananalog or digital interface may be used to communicatively couplegraphics subsystem and the display element. For example, the interfacemay be any of a high definition multimedia interface (HDMI),DisplayPort, wireless HDMI, and/or any other suitable interface usingwireless high definition compliant techniques. In some embodiments, thegraphics subsystem could be integrated into processor 720 or any chipsetof platform 710.

It will be appreciated that in some embodiments, the various componentsof the system 700 may be combined or integrated in a system-on-a-chip(SoC) architecture. In some embodiments, the components may be hardwarecomponents, firmware components, software components or any suitablecombination of hardware, firmware or software.

Optical flow estimation system 110 is configured to provide thecapability for optical flow estimation between a pair of images, using4D cost volume processing, as described previously. Optical flowestimation system 110 may include any or all of the circuits/componentsillustrated in FIGS. 1,3, and 4, as described above. These componentscan be implemented or otherwise used in conjunction with a variety ofsuitable software and/or hardware that is coupled to or that otherwiseforms a part of platform 710. These components can additionally oralternatively be implemented or otherwise used in conjunction with userI/O devices that are capable of providing information to, and receivinginformation and commands from, a user.

In some embodiments, these circuits may be installed local to system700, as shown in the example embodiment of FIG. 7. Alternatively, system700 can be implemented in a client-server arrangement wherein at leastsome functionality associated with these circuits is provided to system700 using an applet, such as a JavaScript applet, or other downloadablemodule. Such a remotely accessible module or sub-module can beprovisioned in real-time, in response to a request from a clientcomputing system for access to a given server having resources that areof interest to the user of the client computing system. In suchembodiments, the server can be local to network 794 or remotely coupledto network 794 by one or more other networks and/or communicationchannels. In some cases, access to resources on a given network orcomputing system may require credentials such as usernames, passwords,and/or compliance with any other suitable security mechanism.

In various embodiments, system 700 may be implemented as a wirelesssystem, a wired system, or a combination of both. When implemented as awireless system, system 700 may include components and interfacessuitable for communicating over a wireless shared media, such as one ormore antennae, transmitters, receivers, transceivers, amplifiers,filters, control logic, and so forth. An example of wireless sharedmedia may include portions of a wireless spectrum, such as the radiofrequency spectrum and so forth. When implemented as a wired system,system 700 may include components and interfaces suitable forcommunicating over wired communications media, such as input/outputadapters, physical connectors to connect the input/output adaptor with acorresponding wired communications medium, a network interface card(NIC), disc controller, video controller, audio controller, and soforth. Examples of wired communications media may include a wire, cablemetal leads, printed circuit board (PCB), backplane, switch fabric,semiconductor material, twisted pair wire, coaxial cable, fiber optics,and so forth.

Various embodiments may be implemented using hardware elements, softwareelements, or a combination of both. Examples of hardware elements mayinclude processors, microprocessors, circuits, circuit elements (forexample, transistors, resistors, capacitors, inductors, and so forth),integrated circuits, ASICs, programmable logic devices, digital signalprocessors, FPGAs, logic gates, registers, semiconductor devices, chips,microchips, chipsets, and so forth. Examples of software may includesoftware components, programs, applications, computer programs,application programs, system programs, machine programs, operatingsystem software, middleware, firmware, software modules, routines,subroutines, functions, methods, procedures, software interfaces,application program interfaces, instruction sets, computing code,computer code, code segments, computer code segments, words, values,symbols, or any combination thereof. Determining whether an embodimentis implemented using hardware elements and/or software elements may varyin accordance with any number of factors, such as desired computationalrate, power level, heat tolerances, processing cycle budget, input datarates, output data rates, memory resources, data bus speeds, and otherdesign or performance constraints.

Some embodiments may be described using the expression “coupled” and“connected” along with their derivatives. These terms are not intendedas synonyms for each other. For example, some embodiments may bedescribed using the terms “connected” and/or “coupled” to indicate thattwo or more elements are in direct physical or electrical contact witheach other. The term “coupled,” however, may also mean that two or moreelements are not in direct contact with each other, but yet stillcooperate or interact with each other.

The various embodiments disclosed herein can be implemented in variousforms of hardware, software, firmware, and/or special purposeprocessors. For example, in one embodiment at least one non-transitorycomputer readable storage medium has instructions encoded thereon that,when executed by one or more processors, cause one or more of the imagereplacement methodologies disclosed herein to be implemented. Theinstructions can be encoded using a suitable programming language, suchas C, C++, object oriented C, Java, JavaScript, Visual Basic .NET,Beginner's All-Purpose Symbolic Instruction Code (BASIC), oralternatively, using custom or proprietary instruction sets. Theinstructions can be provided in the form of one or more computersoftware applications and/or applets that are tangibly embodied on amemory device, and that can be executed by a computer having anysuitable architecture. In one embodiment, the system can be hosted on agiven website and implemented, for example, using JavaScript or anothersuitable browser-based technology. For instance, in certain embodiments,the system may leverage processing resources provided by a remotecomputer system accessible via network 794. In other embodiments, thefunctionalities disclosed herein can be incorporated into other softwareapplications, such as image perception systems, robotics, and virtualreality applications. The computer software applications disclosedherein may include any number of different modules, sub-modules, orother components of distinct functionality, and can provide informationto, or receive information from, still other components. These modulescan be used, for example, to communicate with input and/or outputdevices such as a display screen, a touch sensitive surface, a printer,and/or any other suitable device. Other componentry and functionalitynot reflected in the illustrations will be apparent in light of thisdisclosure, and it will be appreciated that other embodiments are notlimited to any particular hardware or software configuration. Thus, inother embodiments system 700 may comprise additional, fewer, oralternative subcomponents as compared to those included in the exampleembodiment of FIG. 7.

The aforementioned non-transitory computer readable medium may be anysuitable medium for storing digital information, such as a hard drive, aserver, a flash memory, and/or random access memory (RAM), or acombination of memories. In alternative embodiments, the componentsand/or modules disclosed herein can be implemented with hardware,including gate level logic such as a field-programmable gate array(FPGA), or alternatively, a purpose-built semiconductor such as anapplication-specific integrated circuit (ASIC). Still other embodimentsmay be implemented with a microcontroller having a number ofinput/output ports for receiving and outputting data, and a number ofembedded routines for carrying out the various functionalities disclosedherein. It will be apparent that any suitable combination of hardware,software, and firmware can be used, and that other embodiments are notlimited to any particular system architecture.

Some embodiments may be implemented, for example, using a machinereadable medium or article which may store an instruction or a set ofinstructions that, if executed by a machine, may cause the machine toperform a method and/or operations in accordance with the embodiments.Such a machine may include, for example, any suitable processingplatform, computing platform, computing device, processing device,computing system, processing system, computer, process, or the like, andmay be implemented using any suitable combination of hardware and/orsoftware. The machine readable medium or article may include, forexample, any suitable type of memory unit, memory device, memoryarticle, memory medium, storage device, storage article, storage medium,and/or storage unit, such as memory, removable or non-removable media,erasable or non-erasable media, writeable or rewriteable media, digitalor analog media, hard disk, floppy disk, compact disk read only memory(CD-ROM), compact disk recordable (CD-R) memory, compact diskrewriteable (CR-RW) memory, optical disk, magnetic media,magneto-optical media, removable memory cards or disks, various types ofdigital versatile disk (DVD), a tape, a cassette, or the like. Theinstructions may include any suitable type of code, such as source code,compiled code, interpreted code, executable code, static code, dynamiccode, encrypted code, and the like, implemented using any suitable highlevel, low level, object oriented, visual, compiled, and/or interpretedprogramming language.

Unless specifically stated otherwise, it may be appreciated that termssuch as “processing,” “computing,” “calculating,” “determining,” or thelike refer to the action and/or process of a computer or computingsystem, or similar electronic computing device, that manipulates and/ortransforms data represented as physical quantities (for example,electronic) within the registers and/or memory units of the computersystem into other data similarly represented as physical quantitieswithin the registers, memory units, or other such information storagetransmission or displays of the computer system. The embodiments are notlimited in this context.

The terms “circuit” or “circuitry,” as used in any embodiment herein,are functional and may comprise, for example, singly or in anycombination, hardwired circuitry, programmable circuitry such ascomputer processors comprising one or more individual instructionprocessing cores, state machine circuitry, and/or firmware that storesinstructions executed by programmable circuitry. The circuitry mayinclude a processor and/or controller configured to execute one or moreinstructions to perform one or more operations described herein. Theinstructions may be embodied as, for example, an application, software,firmware, etc. configured to cause the circuitry to perform any of theaforementioned operations. Software may be embodied as a softwarepackage, code, instructions, instruction sets and/or data recorded on acomputer-readable storage device. Software may be embodied orimplemented to include any number of processes, and processes, in turn,may be embodied or implemented to include any number of threads, etc.,in a hierarchical fashion. Firmware may be embodied as code,instructions or instruction sets and/or data that are hard-coded (e.g.,nonvolatile) in memory devices. The circuitry may, collectively orindividually, be embodied as circuitry that forms part of a largersystem, for example, an integrated circuit (IC), an application-specificintegrated circuit (ASIC), a system-on-a-chip (SoC), desktop computers,laptop computers, tablet computers, servers, smart phones, etc. Otherembodiments may be implemented as software executed by a programmablecontrol device. In such cases, the terms “circuit” or “circuitry” areintended to include a combination of software and hardware such as aprogrammable control device or a processor capable of executing thesoftware. As described herein, various embodiments may be implementedusing hardware elements, software elements, or any combination thereof.Examples of hardware elements may include processors, microprocessors,circuits, circuit elements (e.g., transistors, resistors, capacitors,inductors, and so forth), integrated circuits, application specificintegrated circuits (ASIC), programmable logic devices (PLD), digitalsignal processors (DSP), field programmable gate array (FPGA), logicgates, registers, semiconductor device, chips, microchips, chip sets,and so forth.

Numerous specific details have been set forth herein to provide athorough understanding of the embodiments. It will be understood by anordinarily-skilled artisan, however, that the embodiments may bepracticed without these specific details. In other instances, well knownoperations, components and circuits have not been described in detail soas not to obscure the embodiments. It can be appreciated that thespecific structural and functional details disclosed herein may berepresentative and do not necessarily limit the scope of theembodiments. In addition, although the subject matter has been describedin language specific to structural features and/or methodological acts,it is to be understood that the subject matter defined in the appendedclaims is not necessarily limited to the specific features or actsdescribed herein. Rather, the specific features and acts describedherein are disclosed as example forms of implementing the claims.

Further Example Embodiments

The following examples pertain to further embodiments, from whichnumerous permutations and configurations will be apparent.

Example 1 is a processor-implemented method for optical flow estimation.The method comprises: extracting, by a processor-based system, a firstset of feature vectors from a first image, each feature vector of thefirst set associated with a pixel of the first image; extracting, by theprocessor-based system, a second set of feature vectors from a secondimage, each feature vector of the second set associated with a pixel ofthe second image; constructing, by the processor-based system, a4-dimensional (4D) cost volume to store distance metrics between one ormore feature vectors of the first set of feature vectors and one or morefeature vectors of the second set of feature vectors; and performing, bythe processor-based system, a flow-semi-global matching (Flow-SGM) onthe 4D cost volume to estimate an optical flow vector for pixels of thefirst image.

Example 2 includes the subject matter of Example 1, further comprisingnormalizing the extracted feature vectors to unity length, andcalculating the distance metric as a Euclidean distance using a vectordot product operation applied to the normalized extracted featurevectors.

Example 3 includes the subject matter of Examples 1 or 2, furthercomprising rescaling and binning the distance metrics of the 4D costvolume to a selected integer range.

Example 4 includes the subject matter of any of Examples 1-3, whereinthe feature vector extraction is performed by a trained convolutionalneural network (CNN), the training based on training data comprisingpairs of training images and associated ground truth optical flowvectors.

Example 5 includes the subject matter of any of Examples 1-4, whereinthe training further comprises performing a stochastic gradient descentoperation on the training data.

Example 6 includes the subject matter of any of Examples 1-5, furthercomprising: down-sampling the first image and the second image, from anoriginal resolution to a selected lower resolution; generating anestimated optical flow field comprising the estimated optical flowvectors; and up-sampling the estimated optical flow field to theoriginal resolution using interpolation.

Example 7 includes the subject matter of any of Examples 1-6, furthercomprising post processing of the up-sampled estimated optical flowfield to in-fill occluded regions, the in-filling based on extrapolationperformed within homography fitted segments of the up-sampled estimatedoptical flow field.

Example 8 includes the subject matter of any of Examples 1-7, furthercomprising providing the estimated optical flow vectors to at least oneof a video segmentation application, a motion detection application, anobject tracking application, an action recognition application, anautonomous driving system, a computer navigation application, and acomputer vision application.

Example 9 includes the subject matter of any of Examples 1-8, whereinthe constructing comprises constructing a 4-dimensional (4D) cost volumeto store a distance metric between each feature vector of the first setof feature vectors and a selected subset of feature vectors of thesecond set of feature vectors.

Example 10 is a system for optical flow estimation. The systemcomprises: a feature extraction circuit to extract a first set offeature vectors from a first image, each feature vector of the first setassociated with a pixel of the first image; and to extract a second setof feature vectors from a second image, each feature vector of thesecond set associated with a pixel of the second image; a cost volumeconstruction circuit to construct a 4-dimensional (4D) cost volume tostore a distance metric between each feature vector of the first set offeature vectors and a selected subset of feature vectors of the secondset of feature vectors; and a cost volume processing circuit to performa flow-semi-global matching (Flow-SGM) on the 4D cost volume to estimatean optical flow vector for pixels of the first image and to generate anestimated optical flow field comprising the estimated optical flowvectors.

Example 11 includes the subject matter of Example 10, wherein the costvolume construction circuit is further to normalize the extractedfeature vectors to unity length, and calculate the distance metric as aEuclidean distance using a vector dot product operation applied to thenormalized extracted feature vectors.

Example 12 includes the subject matter of Examples 10 or 11, wherein thecost volume construction circuit is further to rescale and bin thedistance metrics of the 4D cost volume to a selected integer range.

Example 13 includes the subject matter of any of Examples 10-12, whereinthe feature extraction circuit further comprises a trained convolutionalneural network (CNN) to extract the feature vectors, the training basedon training data comprising pairs of training images and associatedground truth optical flow vectors.

Example 14 includes the subject matter of any of Examples 10-13, furthercomprising a training system to train the CNN based on application of astochastic gradient descent to the training data.

Example 15 includes the subject matter of any of Examples 10-14, furthercomprising: an image down-sampling circuit to down-sample the firstimage and the second image, from an original resolution to a selectedlower resolution; and an up-sampling circuit to up-sample the estimatedoptical flow field to the original resolution using interpolation.

Example 16 includes the subject matter of any of Examples 10-15, furthercomprising a post-processing circuit to in-fill occluded regions of theup-sampled estimated optical flow field, the in-filling based onextrapolation performed within homography fitted segments of theup-sampled estimated optical flow field.

Example 17 includes the subject matter of any of Examples 10-16, whereinthe post-processing circuit is further to provide the estimated opticalflow vectors to at least one of a video segmentation application, amotion detection application, an object tracking application, an actionrecognition application, an autonomous driving system, a computernavigation application, and a computer vision application.

Example 18 is at least one non-transitory computer readable storagemedium having instructions encoded thereon that, when executed by one ormore processors, result in the following operations for optical flowestimation. The operations comprise: extracting a first set of featurevectors from a first image, each feature vector of the first setassociated with a pixel of the first image; extracting a second set offeature vectors from a second image, each feature vector of the secondset associated with a pixel of the second image; constructing a4-dimensional (4D) cost volume to store a distance metric between eachfeature vector of the first set of feature vectors and a selected subsetof feature vectors of the second set of feature vectors; and performinga flow-semi-global matching (Flow-SGM) on the 4D cost volume to estimatean optical flow vector for pixels of the first image or the second imageor both of the first image and second image.

Example 19 includes the subject matter of Example 18, further comprisingthe operations of normalizing the extracted feature vectors to unitylength, and calculating the distance metric as a Euclidean distanceusing a vector dot product operation applied to the normalized extractedfeature vectors.

Example 20 includes the subject matter of Examples 18 or 19, furthercomprising the operations of rescaling and binning the distance metricsof the 4D cost volume to a selected integer range.

Example 21 includes the subject matter of any of Examples 18-20, whereinthe feature vector extraction is performed by a trained convolutionalneural network (CNN), the training based on training data comprisingpairs of training images and associated ground truth optical flowvectors.

Example 22 includes the subject matter of any of Examples 18-21, whereinthe training further comprises the operation of performing a stochasticgradient descent on the training data.

Example 23 includes the subject matter of any of Examples 18-22, furthercomprising the operations of: down-sampling the first image and thesecond image, from an original resolution to a selected lowerresolution; generating an estimated optical flow field comprising theestimated optical flow vectors; and up-sampling the estimated opticalflow field to the original resolution using interpolation.

Example 24 includes the subject matter of any of Examples 18-23, furthercomprising the operation of post processing of the up-sampled estimatedoptical flow field to in-fill occluded regions, the in-filling based onextrapolation performed within homography fitted segments of theup-sampled estimated optical flow field.

Example 25 includes the subject matter of any of Examples 18-24, furthercomprising the operation of providing the estimated optical flow vectorsto at least one of a video segmentation application, a motion detectionapplication, an object tracking application, an action recognitionapplication, an autonomous driving system, a computer navigationapplication, and a computer vision application.

Example 26 is a system for optical flow estimation. The systemcomprises: means for extracting a first set of feature vectors from afirst image, each feature vector of the first set associated with apixel of the first image; means for extracting a second set of featurevectors from a second image, each feature vector of the second setassociated with a pixel of the second image; means for constructing a4-dimensional (4D) cost volume to store distance metrics between one ormore feature vectors of the first set of feature vectors and one or morefeature vectors of the second set of feature vectors; and means forperforming a flow-semi-global matching (Flow-SGM) on the 4D cost volumeto estimate an optical flow vector for pixels of the first image.

Example 27 includes the subject matter of Example 26, further comprisingmeans for normalizing the extracted feature vectors to unity length, andmeans for calculating the distance metric as a Euclidean distance usinga vector dot product operation applied to the normalized extractedfeature vectors.

Example 28 includes the subject matter of Examples 26 or 27, furthercomprising means for rescaling and binning the distance metrics of the4D cost volume to a selected integer range.

Example 29 includes the subject matter of any of Examples 26-28, whereinthe feature vector extraction is performed by a trained convolutionalneural network (CNN), the training based on training data comprisingpairs of training images and associated ground truth optical flowvectors.

Example 30 includes the subject matter of any of Examples 26-29, whereinthe training further comprises means for performing a stochasticgradient descent operation on the training data.

Example 31 includes the subject matter of any of Examples 26-30, furthercomprising: means for down-sampling the first image and the secondimage, from an original resolution to a selected lower resolution; meansfor generating an estimated optical flow field comprising the estimatedoptical flow vectors; and means for up-sampling the estimated opticalflow field to the original resolution using interpolation.

Example 32 includes the subject matter of any of Examples 26-31, furthercomprising means for post processing of the up-sampled estimated opticalflow field to in-fill occluded regions, the in-filling based onextrapolation performed within homography fitted segments of theup-sampled estimated optical flow field.

Example 33 includes the subject matter of any of Examples 26-32, furthercomprising means for providing the estimated optical flow vectors to atleast one of a video segmentation application, a motion detectionapplication, an object tracking application, an action recognitionapplication, an autonomous driving system, a computer navigationapplication, and a computer vision application.

Example 34 includes the subject matter of any of Examples 26-33, whereinthe constructing comprises means for constructing a 4-dimensional (4D)cost volume to store a distance metric between each feature vector ofthe first set of feature vectors and a selected subset of featurevectors of the second set of feature vectors.

The terms and expressions which have been employed herein are used asterms of description and not of limitation, and there is no intention,in the use of such terms and expressions, of excluding any equivalentsof the features shown and described (or portions thereof), and it isrecognized that various modifications are possible within the scope ofthe claims. Accordingly, the claims are intended to cover all suchequivalents. Various features, aspects, and embodiments have beendescribed herein. The features, aspects, and embodiments are susceptibleto combination with one another as well as to variation andmodification, as will be understood by those having skill in the art.The present disclosure should, therefore, be considered to encompasssuch combinations, variations, and modifications. It is intended thatthe scope of the present disclosure be limited not be this detaileddescription, but rather by the claims appended hereto. Future filedapplications claiming priority to this application may claim thedisclosed subject matter in a different manner, and may generallyinclude any set of one or more elements as variously disclosed orotherwise demonstrated herein.

What is claimed is:
 1. A processor-implemented method for optical flowestimation, the method comprising: extracting, by a processor-basedsystem, a first set of feature vectors from a first image, each featurevector of the first set associated with a pixel of the first image;extracting, by the processor-based system, a second set of featurevectors from a second image, each feature vector of the second setassociated with a pixel of the second image; constructing, by theprocessor-based system, a 4-dimensional (4D) cost volume to storedistance metrics between one or more feature vectors of the first set offeature vectors and one or more feature vectors of the second set offeature vectors; and performing, by the processor-based system, aflow-semi-global matching (Flow-SGM) on the 4D cost volume to estimatean optical flow vector for pixels of the first image.
 2. The method ofclaim 1, further comprising normalizing the extracted feature vectors tounity length, and calculating each of the distance metrics as aEuclidean distance using a vector dot product operation applied to thenormalized extracted feature vectors.
 3. The method of claim 1, furthercomprising rescaling and binning the distance metrics of the 4D costvolume to a selected integer range.
 4. The method of claim 1, whereinthe feature vector extraction is performed by a trained convolutionalneural network (CNN), the training based on training data comprisingpairs of training images and associated ground truth optical flowvectors.
 5. The method of claim 4, wherein the training furthercomprises performing a stochastic gradient descent operation on thetraining data.
 6. The method of claim 1, further comprising:down-sampling the first image and the second image, from an originalresolution to a selected lower resolution; generating an estimatedoptical flow field comprising the estimated optical flow vector; andup-sampling the estimated optical flow field to the original resolutionusing interpolation.
 7. The method of claim 6, further comprising postprocessing of the up-sampled estimated optical flow field to in-filloccluded regions, the in-filling based on extrapolation performed withinhomography fitted segments of the up-sampled estimated optical flowfield.
 8. The method of claim 1, further comprising providing theestimated optical flow vector to at least one of a video segmentationapplication, a motion detection application, an object trackingapplication, an action recognition application, an autonomous drivingsystem, a computer navigation application, and a computer visionapplication.
 9. The method of claim 1, wherein the constructingcomprises constructing a 4-dimensional (4D) cost volume to store adistance metric between each feature vector of the first set of featurevectors and a selected subset of feature vectors of the second set offeature vectors.
 10. A system for optical flow estimation, the systemcomprising: a feature extraction circuit to extract a first set offeature vectors from a first image, each feature vector of the first setassociated with a pixel of the first image; and to extract a second setof feature vectors from a second image, each feature vector of thesecond set associated with a pixel of the second image; a cost volumeconstruction circuit to construct a 4-dimensional (4D) cost volume tostore a distance metric between each feature vector of the first set offeature vectors and a selected subset of feature vectors of the secondset of feature vectors; and a cost volume processing circuit to performa flow-semi-global matching (Flow-SGM) on the 4D cost volume to estimatean optical flow vector for pixels of the first image and to generate anestimated optical flow field comprising the estimated optical flowvector.
 11. The system of claim 10, wherein the cost volume constructioncircuit is further to normalize the extracted feature vectors to unitylength, and calculate the distance metric as a Euclidean distance usinga vector dot product operation applied to the normalized extractedfeature vectors.
 12. The system of claim 10, wherein the cost volumeconstruction circuit is further to rescale and bin the distance metricsof the 4D cost volume to a selected integer range.
 13. The system ofclaim 10, wherein the feature extraction circuit further comprises atrained convolutional neural network (CNN) to extract the featurevectors, the training based on training data comprising pairs oftraining images and associated ground truth optical flow vectors. 14.The system of claim 13, further comprising a training system to trainthe CNN based on application of a stochastic gradient descent to thetraining data.
 15. The system of claim 10, further comprising: an imagedown-sampling circuit to down-sample the first image and the secondimage, from an original resolution to a selected lower resolution; andan up-sampling circuit to up-sample the estimated optical flow field tothe original resolution using interpolation.
 16. The system of claim 15,further comprising a post-processing circuit to in-fill occluded regionsof the up-sampled estimated optical flow field, the in-filling based onextrapolation performed within homography fitted segments of theup-sampled estimated optical flow field.
 17. The system of claim 10,wherein the post-processing circuit is further to provide the estimatedoptical flow vector to at least one of a video segmentation application,a motion detection application, an object tracking application, anaction recognition application, an autonomous driving system, a computernavigation application, and a computer vision application.
 18. At leastone non-transitory computer readable storage medium having instructionsencoded thereon that, when executed by one or more processors, result inthe following operations for optical flow estimation, the operationscomprising: extracting a first set of feature vectors from a firstimage, each feature vector of the first set associated with a pixel ofthe first image; extracting a second set of feature vectors from asecond image, each feature vector of the second set associated with apixel of the second image; constructing a 4-dimensional (4D) cost volumeto store a distance metric between each feature vector of the first setof feature vectors and a selected subset of feature vectors of thesecond set of feature vectors; and performing a flow-semi-globalmatching (Flow-SGM) on the 4D cost volume to estimate an optical flowvector for pixels of the first image or the second image or both of thefirst image and second image.
 19. The computer readable storage mediumof claim 18, further comprising the operations of normalizing theextracted feature vectors to unity length, and calculating the distancemetric as a Euclidean distance using a vector dot product operationapplied to the normalized extracted feature vectors.
 20. The computerreadable storage medium of claim 18, further comprising the operationsof rescaling and binning the distance metrics of the 4D cost volume to aselected integer range.
 21. The computer readable storage medium ofclaim 18, wherein the feature vector extraction is performed by atrained convolutional neural network (CNN), the training based ontraining data comprising pairs of training images and associated groundtruth optical flow vectors.
 22. The computer readable storage medium ofclaim 21, wherein the training further comprises the operation ofperforming a stochastic gradient descent on the training data.
 23. Thecomputer readable storage medium of claim 18, further comprising theoperations of: down-sampling the first image and the second image, froman original resolution to a selected lower resolution; generating anestimated optical flow field comprising the estimated optical flowvector; and up-sampling the estimated optical flow field to the originalresolution using interpolation.
 24. The computer readable storage mediumof claim 23, further comprising the operation of post processing of theup-sampled estimated optical flow field to in-fill occluded regions, thein-filling based on extrapolation performed within homography fittedsegments of the up-sampled estimated optical flow field.
 25. Thecomputer readable storage medium of claim 18, further comprising theoperation of providing the estimated optical flow vector to at least oneof a video segmentation application, a motion detection application, anobject tracking application, an action recognition application, anautonomous driving system, a computer navigation application, and acomputer vision application.